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ABSTRACT 

Solar flares occur in complex sunspot groups, but it remains unclear how the probability of pro- 
ducing a flare of a given magnitude relates to the characteristics of the sunspot group. Here, we use 
Geostationary Operational Environment Satellite X-ray flares and Mcintosh group classiflcations from 
solar cycles 21 and 22 to calculate average flare rates for each Mcintosh class and use these to deter- 
mine Poisson probabilities for different flare magnitudes. Forecast verification measures are studied 
to find optimum thresholds to convert Poisson flare probabilities into yes/no predictions of cycle 23 
flares. A case is presented to adopt the true skill statistic (TSS) as a standard for forecast comparison 
over the commonly used Heidke skill score (HSS). In predicting flares over 24 hr, the maximum values 
of TSS achieved are 0.44 (C-class), 0.53 (M-class), 0.74 (X-class), 0.54 (>M1.0), and 0.46 (>C1.0). 
The maximum values of HSS are 0.38 (C-class), 0.27 (M-class), 0.14 (X-class), 0.28 (^Ml.O), and 0.41 
(^Cl.O). These show that Poisson probabilities perform comparably to some more complex predic- 
tion systems, but the overall inaccuracy highlights the problem with using average values to represent 
flaring rate distributions. 

Subject headings: magnetic fields — Sun: activity — Sun: fiares — sunspots 



1. INTRODUCTION 

Solar fiares result from the rel ease of enormous quan- 
tities of energy (up to ^10^^ J: iKane et al.l[2005h from 
twisted, non-potential magnetic fields. Along with coro- 
nal mass ejections (CMEs) , flares are a major contributor 
to space we ather that adversely affects th e near-Earth en- 
vironment (|Hapgood fc Thomson|[20T0l ). The magnetic 
energy to power solar flares is stored primarily in active 
regions (ARs) that are routinely class ified in terms of 
complexity. The Mount Wilson scheme (|Hale et al.lll919l : 
iKiinzcl 19 60l) describes magnetic polarity mixing, while 
the lMcIntoshI ([19 90") scheme describes spatial structuring 
of the magnetic field "footprints" in sunspot groups. We 
concentrate on the Mcintosh scheme that allows up to 
60 classes, yielding reasonable resolution in terms of the 
observed structural complexity. In contrast, the Mount 
Wilson scheme allows up to eight classes, each with fiare 
rate distributions more broad than the Mcintosh classes. 

Recent years have seen a resurgence in the field of 
solar fiare prediction. A sample of the techniques 
empl oyed includes Poisson statistics (iGallagher et al.1 
I2002D . Bayesian s tatistics (lWheatlandl T2005D. support 
vecto r machines (iLi et al.l 120071 ). discriminant anal- 
ysis (i Barncs ct al. '20 0.7|)7 or dinal logistic regression 
( Song et al. 2iI09: Yuan et _ "al1 12010). neural networks 
JColak fc Qahwaiil 2009: Yu et alTT2009t lAhmed et afl 
120121) . wavelet predictor s (Yu et al.ll2010al ). Bayesian net- 
work s ()Yu et al.l l2010bD , predictor teams (iHuang etaTI 
2010t). superposed epoch analysis (iMason & Hoekserna] 
2010), and empirical projections (Falconer et al. 20111). 
It is worth noting that none of these techniques are based 
on physical models of the flare process. Most of the meth- 
ods give a probability for an X-ray flare with peak flux 
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above some magnitude in a time interval. If the aim of 
a prediction method is to provide a result that can be 
readily interpreted as "flare imminent" or "no flare ex- 
pected" , the predicted probabilities need to be converted 
into yes/no forecasts and the forecast success determined. 
However, it is extremely important that appropriate per- 
formance measures are used when comparing the success 
of different forecasts. 

In this Letter, a case is presented for the adoption of 
an existing (but rarely utilized) performance measure for 
comparisons between different solar flare forecasts (Sec- 
tion [2]). As an example, we investigate the performance 
of Poisson probabilities in predicting X-ray flares from 
ARs within 24 hr of a Mcintosh classiflcation being is- 
sued. The data and their sources are detailed in Sec- 
tion [21 while the method to determine forecast perfor- 
mance is described in Section |4l The effect of varying 
the threshold that is used in converting Poisson proba- 
bilities into yes/no predictions is studied in Section l5.ll 
while optimum performance measures are compared to 
the performance of other methods in Section[521 Finally, 
our conclusions and ideas for further work are given in 
Section [51 

2. FORECAST PERFORMANCE MEASURES 

The success of a forecast method that provides yes/no 
forecasts should be studied using a forecast contingency 
table and calculating veriflcation measures (an excellent 
comparison of d ifferent evaluation measures is given in 
lWoodcoc3ll976[ ). Quantitative measures are essential to 
compare the relative performance of different prediction 
methods. The flare forecast contingency table format is 
presented in Table [l] containing the elements TP (true 
positives, "flare" predicted and observed), FN (false neg- 
atives, "no flare" predicted and flare observed), FP (false 
positives, "flare" predicted and none observed), and TN 
(true negatives, "no flare" predicted and none observed). 
Numerous skill scores exist to quantify the performance 
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TABLE 1 

Flare Forecast Contingency Table 



Flare 




Forecast 




Observed 


"Flare" 




"No flare" 


Yes 


TP 




FN 


No 


FP 




TN 



of forecasts, but the lHeidkd (|1926f ) skill score (HSS), 



HSS 



(1) 



2[(TP X TN) - (FN X FP)] 



(TP + FN) (FN + TN) + (TP + FP)(FP + TN) ' 

is most f requently used in flare forecasting (e.g., 
iBarnes fc Leka .200a) . The strength of the HSS lies in 
its use of the whole contingency table to quantify the 
accuracy of achievi ng correct prediction s relat ive to ran- 
dom chance. The iHanssen fc KuipersI ()1965[ ) discrimi- 
nant, known as the true skill statistic (TSS), also uses 
all of the elements, 



TSS 



TP 



FP 



TP -I- FN FP + TN 



(2) 



However, only TSS is unbiased when confronted with 
varying event /no-event sample ratios ([Woodcock 1976) . 
This is demonstrated by considering a new forecast that 
achieves the same prediction success with two times the 
number of flare ARs (i.e., TPncw = 2TP; FN„ew = 2FN; 
TPnow/FNncw = TP/FN). Equation m becomes, 



HSS 

1] 



2[(2TP X TN) - (2FN x FP)] 



(2TP + 2FN)(2FN + TN) + (2TP + FP)(FP + TN) 
^ HSS , (3) 

while Equation [2] becomes, 

2TP FP 



TSS„ 



2TP + 2FN FP + TN 
TP FP 



TP + FN FP + TN 



= TSS 



(4) 



This simple example shows that HSS changes despite the 
prediction success being held constant, highlighting the 
problem with using HSS to compare between different 
methods (or different trials of the same method). Note 
that we do not dismiss the usefulness of HSS as a measure 
within a particular forecast method trial. However, we 
propose TSS to be the standard measure for comparing 
between flare forecasts, given that different studies use 
differing flare/no- flare sample ratios. 

3. DATA SOURCES 

3.1. Training Set 

In order to facilitate the calculation of flare probabili- 
ties, we obtained historical flare rates for each Mcintosh 
class from two locations that share the same data source. 
The National Oceanic and Atmospheric Administration 
(NOAA) Space Weather Prediction Center (SWPC) pro- 
vided total numbers of Geostationary Operational Envi- 
ronmental Satellite (GOES) C-, M-, and X-class flares 
and the originating ARs for each Mcintosh classification 



over 1988 December 1 to 1996 June 30 (C.C. Balch 2011, 
private communication). Additional M- and X-cl ass flare 
and Mcintosh class numbers were taken from iKildahll 
(1980) over 1969-1976, but relate to the same data source 
(i.e., NOAA-collated ground-based AR observations and 
GOES flare events) . These were included to increase the 
rare M- and X-class samples so that the rates were more 
statistically significant. Table [2] presents the recorded 
Mcintosh classes with the numbers of observed regions 
and flares produced. 

3.2. Testing Set 

The AR and flare data that are used for testing were 
gathered from the online archives of NOAA/SWPCE| 
Mcintosh classes of regions that have predictions issued 
and tested were taken from the daily NOAA Solar Re- 
gion Summary files over 1996 August 1 to 2010 December 
31. In this work, each daily record of a NOAA region was 
treated as an individual measurement, yielding 22276 AR 
samples. GOES flares with originating NOAA numbers 
assigned to their entry were extracted from the edited 
daily NOAA Solar Event Reports over the same date 
range as the Mcintosh classes. NOAA region numbers 
attributed to any associated Ha flares were used for those 
GOES flares with no NOAA region directly assigned. 

4. ANALYSIS METHOD 

4.1. Historical Poisson Probabilities 

Following lBornmann fc Shawl (|1994D . GOi^^-class flare 
rates in 24 hr intervals were calculated for each Mcintosh 
class by combining the number of flares that classiflcation 
produced over 1969-1976 and 1988-1996 and dividing by 
the number of times the Mcintosh class was observed 
in both periods, iVtot. It should be noted that C-class 
flares were not provided in Kildahl (1980)- In order to 
provide C-class related forecasts comparable to those for 
M- and X-classes, rates measured over 1988-1996 were 
taken to hold for 1969-1976. The relative numbers of 
Mcintosh observations in the time periods was then used 
to determine the expected number of C-class flares for 
1969-1976 (Table [2 Column 7). The C-, M-, and X- 
class flare rates combined over 1969-1976 and 1988-1996 
are presented in Columns 10-12 of Tabled with the error 



on the average rate (rr — 



-l/2^ 



given in Column 13. 



To achieve a probabil ity of flaring we follow the Poisson 
statistics technique of iGallagher et al.l (|2002[ ) . Under the 
assumption of flares being a Poisson-distributed processI3 
the probability of observing N flares in a time interval is 
related to the average flare rate, /i, over that interval by. 



N 



Nl 



exp(-/i) 



(5) 



When fi is calculated over 24 hr intervals, the probability 
of observing one or more flares in any 24 hr interval is, 



P^^iN^l): 



Pf^iN - 0) 
exp(-/i) . 



(6) 



^ |http : //www ■ swpc ■ noaa ■ gov/f tpdir/«ar ehouse"7] 

* lAschwanden & McTiernanI 112010 ) show that flare waiting 
times are consistent with a nonstationary Poisson process. Ap- 
plication of Poisson probability here averages the time-dependent 
rates in 24 hr intervals and over the solar cycle. 
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TABLE 2 

McIntosh Classification Flare Statistics 



Mcintosh 



SWPC (1988-1996) 



Kildahl (1969-1976)*= 



Combined Flare Rate (24 hr ^) Poisson Flare Probability (%) 



Region 




Total Flares 


R.Ggioii 


Total Flares 




In GOES Class 




In GOES Class 


Above 


GOES'^ 








iVi 


A_ 






IVi 






IVi 


A. 






i\/r 
iVI 




Ml.O 


Cl.O 


AXX 


2748 


82 


10 





2517 


75.1 


31 


3 


0.03 


0.01 


0.00 


0.01 


3 


1 





1 


4 


BXO 


3342 


217 


18 


1 


1906 


123.8 


41 


2 


0.06 


0.01 


0.00 


0.01 


6 


1 





1 


7 


BXI 














334 


0.0 


20 





0.00 


0.06 


0.00 


0.05 





6 





6 


6 


HRX 


336 


21 


1 





211 


13.2 


7 


1 


0.06 


0.01 


0.00 


0.04 


6 


1 





2 


8 


HSX 


1968 


94 


21 





1963 


93.8 


99 


6 


0.05 


0.03 


0.00 


0.02 


5 


3 





3 


8 


HAX 


598 


49 


13 





222 


18.2 


14 





0.08 


0.03 


0.00 


0.03 


8 


3 





3 


11 


HHX 


53 


3 


1 





150 


8.5 


16 


2 


0.06 


0.08 


0.01 


0.07 


6 


8 


1 


9 


14 


HKX 


49 


11 


2 





38 


8.5 


7 





0.22 


0.10 


0.00 


0.11 


20 


10 





10 


28 


CRO 


745 


102 


3 





368 


50.4 


20 


2 


0.14 


0.02 


0.00 


0.03 


13 


2 





2 


15 


CRI 


6 


2 








152 


50.7 


7 





0.33 


0.04 


0.00 


0.08 


28 


4 





4 


31 


CSO 


1504 


284 


27 





1020 


192.6 


40 


1 


0.19 


0.03 


0.00 


0.02 


17 


3 





3 


19 


CSI 


14 


8 


2 





211 


120.6 


16 


2 


0.57 


0.08 


0.01 


0.07 


44 


8 


1 


9 


48 


CAO 


1455 


361 


38 


2 


232 


57.6 


18 


1 


0.25 


0.03 


0.00 


0.02 


22 


3 





3 


25 


CAI 


27 


14 


6 





166 


86.1 


19 





0.52 


0.13 


0.00 


0.07 


40 


12 





12 


48 


CHO 


88 


21 


2 


1 


112 


26.7 


8 


1 


0.24 


0.05 


0.01 


0.07 


21 


5 


1 


6 


26 


CHI 


2 


1 








29 


14.5 


6 





0.50 


0.19 


0.00 


0.18 


39 


18 





18 


50 


CKO 


135 


59 


11 





52 


22.7 


13 


2 


0.44 


0.13 


0.01 


0.07 


35 


12 


1 


13 


44 


CKI 


17 


14 


6 





28 


23.1 


6 


2 


0.82 


0.27 


0.04 


0.15 


56 


23 


4 


27 


68 


DRO 


63 


12 


3 





75 


14.3 


6 





0.19 


0.07 


0.00 


0.09 


17 


6 





6 


23 


DRI 


2 


7 








54 


189.0 


7 


1 


3.50 


0.12 


0.02 


0.13 


97 


12 


2 


13 


97 


DSO 


546 


198 


26 


1 


553 


200.5 


51 


6 


0.36 


0.07 


0.01 


0.03 


30 


7 


1 


7 


36 


DSI 


39 


34 


6 





246 


214.5 


31 


1 


0.87 


0.13 


0.00 


0.06 


58 


12 





12 


63 


DSC 














20 


0.0 


5 


2 


0.00 


0.25 


0.10 


0.22 





22 


10 


30 


30 


DAO 


1775 


784 


124 


4 


288 


127.2 


28 


2 


0.44 


0.07 


0.00 


0.02 


36 


7 





7 


40 


DAI 


391 


419 


70 


6 


324 


347.2 


58 


7 


1.07 


0.18 


0.02 


0.04 


66 


16 


2 


18 


72 


DAC 


8 


5 


3 





46 


28.8 


12 


1 


0.62 


0.28 


0.02 


0.14 


46 


24 


2 


26 


60 


DHO 


46 


26 


1 


1 


43 


24.3 


11 





0.57 


0.13 


0.01 


0.11 


43 


13 


1 


14 


51 


DHI 


11 


14 


1 





41 


52.2 


3 





1.27 


0.08 


0.00 


0.14 


72 


7 





7 


74 


DHC 














6 


0.0 


2 





0.00 


0.33 


0.00 


0.41 





28 





28 


28 


DKO 


217 


178 


55 


5 


43 


35.3 


14 


2 


0.82 


0.27 


0.03 


0.06 


56 


23 


3 


25 


67 


DKI 


223 


288 


69 


6 


88 


113.7 


42 


6 


1.29 


0.36 


0.04 


0.06 


73 


30 


4 


33 


81 


DKC 


57 


93 


35 


5 


100 


163.2 


72 


10 


1.63 


0.68 


0.10 


0.08 


80 


49 


9 


54 


91 


ESO 


95 


37 


6 





82 


31.9 


14 





0.39 


0.11 


0.00 


0.08 


32 


11 





11 


39 


ESI 


18 


33 


1 





78 


143.0 


22 


2 


1.83 


0.24 


0.02 


0.10 


84 


21 


2 


23 


88 


EAG 


459 


267 


61 





47 


27.3 


10 


4 


0.58 


0.14 


0.01 


0.04 


44 


13 


1 


14 


52 


EAI 


295 


370 


83 


2 


82 


102.8 


48 


1 


1.25 


0.35 


0.01 


0.05 


71 


29 


1 


30 


80 


EAC 


3 


5 


1 





17 


28.3 


6 


3 


1.67 


0.35 


0.15 


0.22 


81 


30 


14 


39 


89 


EHO 


42 


31 


6 





39 


28.8 


6 





0.74 


0.15 


0.00 


0.11 


52 


14 





14 


59 


EHI 


15 


24 


6 





45 


72.0 


28 


4 


1.60 


0.57 


0.07 


0.13 


80 


43 


6 


47 


89 


EHC 


2 


9 








4 


18.0 


8 





4.50 


1.33 


0.00 


0.41 


99 


74 





74 


100 


EKO 


185 


173 


35 


3 


52 


48.6 


20 


1 


0.94 


0.23 


0.02 


0.06 


61 


21 


2 


22 


69 


EKI 


423 


703 


173 


23 


81 


134.6 


103 


11 


1.66 


0.55 


0.07 


0.04 


81 


42 


7 


46 


90 


EKC 


103 


278 


132 


17 


63 


170.0 


149 


21 


2.70 


1.69 


0.23 


0.08 


93 


82 


20 


85 


99 


PRI 














2 


0.0 


1 





0.00 


0.50 


0.00 


0.71 





39 





39 


39 


ESQ 


14 


9 


3 





13 


8.4 


6 


1 


0.64 


0.33 


0.04 


0.19 


47 


28 


4 


31 


64 


ESI 


6 


12 








8 


16.0 


15 





2.00 


1.07 


0.00 


0.27 


86 


66 





66 


95 


FAQ 


73 


63 


16 





3 


2.6 








0.86 


0.21 


0.00 


0.11 


58 


19 





19 


66 


EAI 


91 


106 


35 


3 


12 


14.0 


8 





1.16 


0.42 


0.03 


0.10 


69 


34 


3 


36 


80 


EHO 


9 


5 


1 





10 


5.6 








0.56 


0.05 


0.00 


0.23 


43 


5 





5 


46 


EHI 


10 


17 


9 





18 


30.6 


15 





1.70 


0.86 


0.00 


0.19 


82 


58 





58 


92 


EHC 














5 


0.0 


4 





0.00 


0.80 


0.00 


0.45 





55 





55 


55 


EKO 


97 


165 


29 


1 


19 


32.3 


6 





1.70 


0.30 


0.01 


0.09 


82 


26 


1 


27 


87 


EKI 


235 


517 


161 


17 


47 


103.4 


106 


17 


2.20 


0.95 


0.12 


0.06 


89 


61 


11 


66 


96 


EKC 


93 


233 


146 


24 


27 


67.6 


39 


13 


2.51 


1.54 


0.31 


0.09 


92 


79 


27 


84 


99 



^ Only includes classifications producing ^1 C-, M-, or X-class flare in citiier time range. 
^ From Kildald (1980). 

^ Non-integer flare numbers result from use of observed C-class rates from SWPC (1988—1996). 
"Above GOES XI. 0" is equivalent to "In GOES Class X". 

Poisson probabilities for a Mcintosh class to produce at 
least one flare within a 24 hr interval are displayed in 
Columns 14-16 of Table [5] for the C-, M-, and X-classes, 
with those for flaring ^Ml.O (M- and X-classes) and 
>C1.0 (C-, M-, and X-classes) in Columns 17-18. 

4.2. Contingency Table Construction 

Two sets of binary (yes/no) information are required 
to build the forecast contingency tables — flare truth and 
flare prediction. The first is achieved by cross-referencing 



the SWPC-extracted AR and GOES event lists over the 
testing period (1996-2010). For each AR observed each 
day, the list of AR-associated flares within 24 hr of the 
Mcintosh class being issued is searched for the NOAA 
number of that AR (i.e., the same UT day; Mcintosh 
classes are published at 00:30 UT based on data before 
00:00 UT). Flare truth is set to "no" for ARs when no 
flares occurred with peak magnitude at the appropriate 
level or "yes" when ^1 flare occurred. This results in 
the number of flare ARs, Na, being 3667, 810, and 92 for 
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TABLE 3 

Flare Forecast Contingency Table and Skill Score 
Dependence on Threshold Poisson Probability 



Prob. Flaring In GOES M-class Within 24 hr 

Contingency Table Elements Skill Scores FN/FP 



% 


TP 


FN 


FP 


TN 


HSS 


TSS 







810 





21466 





0.000 


0.000 


0.00 


10 


568 


242 


3832 


17634 


0.167 


0.523 


0.06 


20 


452 


358 


2163 


19303 


0.221 


0.457 


0.17 


30 


330 


480 


1129 


20337 


0.256 


0.355 


0.43 


40 


288 


522 


850 


20616 


0.264 


0.316 


0.61 


50 


209 


601 


471 


20995 


0.256 


0.236 


1.28 


60 


202 


608 


458 


21008 


0.250 


0.228 


1.33 


70 


149 


661 


308 


21158 


0.215 


0.170 


2.15 


80 


59 


751 


173 


21293 


0.099 


0.065 


4.34 


90 





810 





21466 


0.000 


0.000 


oo 


100 





810 





21466 


0.000 


0.000 


oo 



is random); varying the threshold maintains the sample 
ratio, but alters the forecast success ratio. 

Figure [U also shows TSS peaking at FN/FP ss 7Vfi/7V„f 
(panels llJ:) and ((lli)), where iVnf is the number of no- 
flare ARs (iVnf = 22276 - iVfl). This indicates that the 
TSS measure of accuracy is maximized when the frac- 
tional frequency of incorrect predictions for flare ARs 
equals the fractional frequency of incorrect predictions 
for no-flare ARs, FN/A^a = FP/A^nf- This dependence 
on the fractional form of incorrect frequencies again il- 
lustrates that forecasts with differing sample ratios will 
keep the same TSS value: changes in FN or FP are ab- 
sorbed by corresponding changes in A^a or A^nf (Equa- 
tion g]). Note that HSS = TSS when Nn = A^„f, but this 
is seldom the case in flare forecasting as flares are rare 
events. 

5.2. Inter-forecast Skill Score Comparison 

Flare forecasting studies do not usually quote values 
of TSS and rarely use equal flare/no-flare sample sizes 
that make HSS equal TSS0 Unfortunately, most do not 
show contingency tables that would enable TSS or other 
unpublished measures to be calculated. Optimum val- 
ues of TSS and HSS achieved by Poisson probabilities in 
Section 15.11 are compared to other methods in Table IH 
restricted to those with a contingency table (or values 
one can be inferred from) and those quoting HSS. Other 
measures used in flare forecasting include the probabil- 
ity of detection: POD = TP/ [TP -f FN]; the false alarm 
ratio: FAR = FP/[TP-HFP]; and the odds ratio or accu- 
racy: ACC= [TP-HTN]/[TP-f FN-f FP + TN]. Table [4] 
includes these to allow broad assessment of each method. 

5.2.1. Performance for Separate Flare-magnitude Classes 

In forecasting flares in the separate GOES flare classes 
over 2 4 hr interval s, the ordinal logistic regression model 
(4) of iSong et all (|2009t ) yields the highest TSS values 
for C- and M-classes, while the opti mum TSS fo r Pois - 
son probabilities is highest for X-class. iSong et all ()20Q9f ) 
convert flare probabilities into predictions using static 
thresholds of 50% for C- and M-class events and 25% for 
X-class events. Improved performance might be achieved 
by the ISong et alT (|2009f ) technique by investigating its 
dependence on the prediction thresh old, as studied here. 
Unfortunately, the lSong et al.l (|2009[ ) results are the most 
susceptible to noise (given a small sample of 55 ARfQ) 
and weighted toward successful prediction of flaring ARs, 
since their samples of each flare-magnitude class have 
higher proportions of flaring ARs (36%, 31%, and 13% 
for C-, M-, and X-classes) than typically observed (16%, 
4%, and 0.4% in cycle 23). It is unclear how this 
method would perform operationally when non-flaring 
ARs outnumber flaring ARs and successfully predicting 
no-flare periods has increase d importance. Th e signifi- 
cantly lower performance of I Yuan et al.l (|2010D in TSS 
and HSS is surprising wit h adding support vector ma- 
chine classification to the ISong et all (|2009( ) technique. 

This behaviour is good practice given the rarity of flare events. 
Forcing a balance between N^^f and A^g results in discarding ~80%, 
~96%, and >99% of the available A^„f sample when considering 
events SsCl.O, SsMl.O, and JsXl.O, respectively. 

^ Changing 1 TP into FN (and vice versa) yields ±0.050, ±0.059, 
and ±0.143 in TSS for C-, M-, and X-class forecasts, respectively. 



Note. — (The entire tabic is available online in machine-readable 
form. A portion is shown for guidance regarding its form and content.) 

C-, M-, and X-class events, respectively. Similarly, 
is 858 and 3912 for ARs with flares >M1.0 and ^Cl.O, 
respectively. 

The second set of information is achieved by applying 
a flare/no-flare discriminating threshold to the Poisson 
probabilities achieved in Section BTTl All ARs in the test 
period had the corresponding Mcintosh class flare prob- 
abilities (Table [2|) assigned to the 24 hr interval after ob- 
servation. Probabilities were converted into predictions 
by choosing a threshold (varying in 1% increments from 
0% to 100%) and predicting "no flare" for values be- 
low the threshold and "flare" for those at or above the 
threshold. 

The contingency table elements (Section [T] and Ta- 
ble [1]) are the number of each pair combination of flare 
truth and prediction. The variation of the HSS and TSS 
measures are shown in Figure [T] and Table [3] for sepa- 
rate forecasts of C-, M-, and X-class events and forecasts 
^Ml.O and ^Cl.O. It is worth noting that the approach 
applied here changes occurrences of TP to FN and FP 
to TN as the threshold probability rises ("flare" predic- 
tions become "no flare" predictions, but flare truth is 
unchanged) . 

5. RESULTS & DISCUSSION 

5.1. Skill Score Variation With Prediction Threshold 

Figure [1] shows HSS peaking at FN/FP«1 (panels ((TJi) 
and HId)). This indicates that the HSS measure of fore- 
cast accuracy is maximizecQ when the absolute frequency 
of incorrect predictions are equal, FN w FP. Sensitivity 
to FN/FP conflrms the HSS depen dence on sample ratio 
(Equation [3] here; IWoodcocl3 119761 ) . Table [1] shows that 
TP and FN increase if additional flaring ARs are included 
(FN/FP increases and unity occurs at higher thresholds). 
Conversely, FP and TN increase if additional no-flare 
ARs are included (FN/FP decreases and unity occurs at 
lower thresholds). Note that varying the number of ARs 
included in the verification test does not have the same 
effect as varying the threshold used to construct the con- 
tingency tables: adding ARs alters the sample ratio, but 
maintains the forecast success ratio (if the added sample 

^ The concept of a peak value of skill score is only possible here 
because forecast performance is altered by varying the threshold. 
Methods without a variable threshold can only achieve one value. 
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Fig. 1. — Threshold probability variation of HSS (a-b), TSS (c-d), and FN/FP (e-f). Curves in panels (a), (c), and (e) are forecasts over 
24 hr of at least one C-class (solid), M-class (dashed), or X-class (dotted) flare, while those in panels (b), (d), and (f) are forecasts of at 
least one flare ^Cl.O (solid), >M1.0 (dashed), and ^Xl.O (dotted). Arrows in panels (a) and (b) mark thresholds where FN/FP fa 1, while 
those in panels (c) and (d) mark thresholds where FN/FP Nfi/N^f. Only FN/FP Nfi/N^f is marked for X-class (<!), as 0.35 is the 
largest flnite value. 



It is worth noting that neural network operation a l fore - 
casting of Mcintosh classes by iColak fc Qahw ai? (20091) 
yields an HSS between that found here and 'Song ct al.1 
([2009) for all flare classes, but published values do not 
permit TSS calculation. 

For X-class flares, the optimal TSS value for Poisson 
probabilities over 24 hr intervals is higher than that from 
the su perposed-epoch analysis of .Mason & Hocksema 
201C) over 6hr intervals. The iMason &: Hoeksemal 



20T3) technique is segmented by predicting "no flare" 



for ARs with a magnetic quantity change over the previ- 
ous 40 hr below one threshold and "flare" for ARs with 
changes above a second higher threshold. The forecast 
success would likely decrease if the unpredicted mid- 
range AR population were included. Note the optimum 
TSS found here has large FAR because it results from a 
yes/no prediction threshold of 1%, meaning that X-class 
flares are always predicted for all Mcintosh classifications 
that historically produced any X-class activity. 

5.2.2. Performance above the Ml.O Level 

In forec asting fla res ^Ml.O, sequential supervised 

le arning bvlYu et ahl (|2009D and the predictor team work 
of lHuang et al.l ( '20100 yield the highest HSS values that 
equate to TSS from equal flare and no-flare sample 



sizes. However, they predict cumulative flare importance 
equivalent to at least one Ml.O event in a 48 hr interval 
(e.g., 10 Cl.O, 5 C2.0, 2 C5.0). This raises uncertainty 
about these good skill scores representing the success- 
ful forecasting of events ^Ml.O, as forecasting multiple 
C-class events from an AR may be easier than single 
M-class events. More importantly, both works only con- 
sider ARs that produce at least one flare ^Cl.O in their 
life. This segmentation weakens their interpretati on for 
operational purposes (similar to the case of Song et al.l 
{2009) in Section I5.2.1[). as the numb er of AR no- flare 

S ieriod s considered in lYu et al.l (j2009D and IHuang et al.l 
20100 are severely reduced by excluding all completely 
non-flaring NOAA numbers. It is worth noting that the 
optimum TSS achieved here equals th at for the applica- 
tion of 1 decision tree in IHuang et al.l (|2010f ) (with HSS, 
hence TSS, of ~0.54). 

The high est HSS achieved in t he discriminant analy- 
sis study of lBarnes fc Lekal ()2008[ ) was found using total 
unsigned magnetic flux. However, the value is low (no- 
tably also lower than the optimum HSS found here) and 
likely due to the overlap between flaring and non-flaring 
AR-parameter distributions. However, proper compar- 
ison to the performance of Poisson probabil i ties is not 
possible as TSS values from iBarnes fc Lekal (|2008f ) are 
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TABLE 4 

Inter-forecast Skill Score Comparison 



Forecast 
Flare Level 


Interval (hr) 


1 bb 


r IN / r F 


Verification Measure 
HSS POD 


rAH 




Reference 


C-class 


24 
24 
24 
24 
24 


0.650 
0.090 
0.443 
0.399 


0.429 
7.000 
0.176 
0.836 


0.493 
0.623 
0.116 
0.296 
0.384 


0.772 
0.850 
0.138 
0.737 
0.513 


0.319 
0.292 
0.471 
0.670 
0.531 


0.811 
0.818 
0.722 
0.711 
0.824 


Colak & Qahwaji (2009) 
Sons et al. (2009]=' 
Yuan et al. (2010) 
This work: optimum TSS 
This work: optimum HSS 


M-class 


24 
24 
24 
24 
24 


0.621 
0.054 
0.526 
0.272 


6.000 
1.963 
0.070 
1.002 


0.470 
0.676 
0.061 
0.177 
0.273 


0.865 
0.647 
0.221 
0.693 
0.299 


0.688 
0.083 
0.643 
0.864 
0.701 


0.944 
0.873 
0.652 
0.829 
0.949 


Colak & Oahwaii (2009) 
Song et al. (2009)=' 
Yuan et al. (2010) 
This work: optimum TSS 
This work: optimum HSS 


X-class 


24 
24 
24 
6 


0.693 
0.160 
0.312 


2.000 
3.000 
0.005 


0.169 

U. ( OtJ 

0.205 
0.008 


0.917 

0.206 
0.617 


0.967 
0.167 
0.562 
0.992 


0.981 
0.945 
0.843 
0.694 


Colak & Qahwaii (2009) 
Sons et al. (2009]=' 
Yuan et al. (201C)) 
Mason & Hoeksema (2010)'' 




24 

24 


0.740 
0.241 


0.005 
0.348 


0.049 
0.142 


0.859 
0.250 


0.971 
0.896 


0.881 
0.988 


This work: optimum tSd 
This work: optimum HSS 


SsMl.O 


24 
48 
48 

24 
24 


0.650 
~0.66 
0.539 
0.273 


1.105 

0.072 
1.089 


0.153 
0.650 
~0.66 
0.190 
0.280 


0.817 
~0.90 
0.704 
0.298 


0.169 

0.854 
0.684 


0.922 
0.825 

0.830 
0.948 


Barnes & Leka (2008)'^ 
Yu et aL (2009 1'^ 
Huang et al. (2010) 
This work: optimum TSS 
This work: optimum HSS 


SsCl.O 


24 
24 

24 
24 


0.641 
0.456 
0.412 


0.952 
0.178 
0.942 


0.512 
0.636 
0.315 
0.407 


0.814 
0.662 
0.753 
0.520 


0.301 
0.349 
0.649 
0.495 


0.805 
0.961 
0.712 
0.826 


Colak & Oahwaii (2009) 
Ahmed et al. (2012)"' 
This work: optimum TSS 
This work: optimum HSS 



=* Model (4). 



Reported HSS contains miscalculation of expected correct random forecasts (J. P. Mason 2011, private communication). 
^ Total unsigned magnetic flux. 

^ Contingency table provided by X. Huang (2011, private communication). 

^ Temporally segmented training and operational testing (test still spatially segmented to ARs 5^60° from disk centre). 
' Contingency table calculated from reported forecast measures. 

not available. 

5.2.3. Performance above the Cl.O Level 

Finally, in forecasting flares ^Cl.O in 24 hr inter- 
vals, t he application of neural networks by lAhmed et alj 
()2012l) to magnetic properties with semi-operational test- 
ing yields the highest TSS. Semi-operational refers to no 
segmentation being applied based on flare history, while 
spatial segmentation was applied (only ARs within 60° 
of disk centre). Optimum TSS values show that Poisson 
proba bilities do not perform as well as the machine learn- 
ing of lAhmed et al.l ()2012[ ). possibly from truly opera- 
tional application (e.g., ARs near the limb may be mis- 
classified by foreshortening effects and inappropriately 
predic ted). It is interesting tha t the neural network sys- 
tem of iColak fc Qahwaiil (|2Q09f ) does not perform signif- 
icantly better than the application of Poisson probabili- 
ties, but this is based on HSS as TSS is unavailable for 
their work. 

6. CONCLUSIONS 

To be operationally practical, flare forecasts should 
provide predictions for all ARs irrespective of properties 
or flare history (i.e., no minimum criteria in selecting 
ARs for flare prediction). We have presented the varia- 
tion of forecast verification measures with the threshold 
Poisson probability used to define "fiare" and "no flare" 
predictions. Forecasts for different X-ray flare levels from 
aU NOAA ARs over 1996 August 1 to 2010 December 31 
were tested against observed flares. 

Optimized forecasts from Poisson flare probabilities are 
found to perform to similar standards as some more so- 



phisticated methods (e.g., in forecasting events ^Ml.O). 
However, the relatively low levels of optimum skill score 
(HSS < 0.4 and TS^| < 0.5) lend further support to the 
need to use flarin g rate distribution s (in, e.g., a Bayesian 
methodology like I Wheat landl [2005() rather than averages 
over an AR class. This will be a focus of future work in 
the construction of Bayesian prior distributions of AR- 
property-dependent flare rates. 

Providing forecasts and quantifying their performance 
will be acutely necessary as we approach the activity 
maximum of cycle 24. It is foreseen that specific fore- 
cast requirements may be targeted by careful consider- 
ation of skill scores and particular contingency table el- 
ements, e.g., the threshold for interpreting fiare proba- 
bilities as yes/no forecasts could be tailored to achieve 
relative failure ratios (FN/FP) within the tolerance of 
various groups in the scientific and space weather com- 
munities. However, complete fiare forecasts will require 
a deeper physical understanding of magnetic energy re- 
lease and partitioning of energy between fiare emission 
at different temperatures, accelerat ion of CMEs, and a c- 
celeration of high-energy particles ()Emslie et al.ll2005[ ). 

In closing, it is imperative that the performance of flare 
forecasting methods with differing flare/no-flare sample 
ratios is compared in a suitable manner. This requires 
the use of a verification measure that is not sensitive 
to the flare/no-flare sample ratio. We have highlighted 
an issue with the commonly adopted HSS and instead 
propose the sample ratio invariant TSS for the reliable 

* Optimum TSS of 0.74 is found here for X-class at a threshold 
of 1%, but this results in severe overprediction and large FAR. 
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comparison of flare forecasts. 
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